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ABSTRACT 

Colombi et al. 1999 (paper I) investigated the counts- in-cells statistics and their re- 
spective errors in the rCDM Virgo Hubble Volume simulation. This extremely large 
iV-body experiment also allows a numerical investigation of the cosmic distribution 
function, T(A) itself for the first time. For a statistic A, T(A) is the probability den- 
sity of measuring the value A in a finite galaxy catalog. T was evaluated for the 
distribution of counts- in-cells, Pm, the factorial moments, F^, and the cumulants, £ 
and Sjv's, using the same subsamples as paper I. 

While paper I concentrated on the first two moments of T, i.e. the mean, the cos- 
mic error and the cross-correlations, here the function T is studied in its full generality, 
including a preliminary analysis of joint distributions T(A,B). The most significant, 
and reassuring result for the analyses of future galaxy data is that the cosmic dis- 
tribution function is nearly Gaussian provided its variance is small. A good practical 
criterion for the relative cosmic error is that AA/A ^ 0.2. This means that for accu- 
rate measurements, the theory of the cosmic errors, presented by Szapudi & Colombi 
(1996) and Szapudi, Colombi & Bernardeau (1999), and confirmed empirically by pa- 
per I, is sufficient for a full statistical description and thus for a maximum likelihood 
rating of models. As the cosmic error increases, the cosmic distribution function T 
becomes increasingly skewed and is well described by a generalization of the lognor- 
mal distribution. The cosmic skewness is introduced as an additional free parameter. 
The deviation from Gaussianity of T(Fk) and T(5V) increases with order k, N, and 
similarly for T(P^) when N is far from the maximum of P/v, or when the scale ap- 
proaches the size of the catalog. For our particular experiment, T(-Ffe) and T(£) are 
well approximated with the standard lognormal distribution, as evidenced by both 
the distribution itself, and the comparison of the measured skewness with that of the 
lognormal distribution. 

Key words: large scale structure of the universe - galaxies: clustering - methods: 
numerical - methods: statistical 



1 INTRODUCTION 

Precision higher order statistics will become a reality when 
the new wide field surveys, such as the SDSS and the 
2dF, become available in the near future. These prospective 
measurements contain information relating to the regime 
of structure formation, to the nature of initial conditions, 
and to the physics of galaxy formation. The ability of such 
measurements to constrain models, in a broad sense, is in- 
versely proportional to the overlap between the distribution 
of statistics predicted by different theories for a finite galaxy 
survey. More precisely, maximum likelihood methods give 
the probability of the particular measurements for each the- 
ory, or after inversion, the likelihood of the theories them- 



selves. This is an especially natural and fruitful procedure 
for a Gaussian distribution, where the first two moments are 
sufficient for a full statistical description. This simple case is 
assumed for most analyses in the literature, and it motivates 
the special attention given to the investigation of the errors, 
or standard deviations. In general, however, the underlying 
distribution of measurements can be strongly non-Gaussian, 
in which case the correct shape for the distribution has to 
be employed for a maximum likelihood analysis. As a conse- 
quence, terms such as "l-cr measurement" loose their usual 
meaning: a few a deviation from the average can be quite 
likely for a non-Gaussian distribution with a long tail. There- 
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fore it is of utmost importance to ask two important ques- 
tions: 

(i) In what regime is the Gaussian approximation valid 
for the distribution of the measured statistical quantities? 

(ii) If the Gaussian limit is violated, is there any rea- 
sonably simple, practical assumption which would enable a 
maximum likelihood analysis? 

This paper attempts to answer these questions by study- 
ing numerically the underlying distribution function of mea- 
surements for estimators of higher order statistics based 
on counts-in-cells. This complements the thorough numer- 
ical investigation of the errors undertaken by Colombi et 
al. (1999, hereafter paper I), and the theoretical investiga- 
tion of the errors exposed in a suite of papers by Szapudi & 
Colombi (1996, hereafter SC), Colombi, Szapudi, & Szalay 
(1998, hereafter CSS), and Szapudi, Colombi, & Bernardeau 
(1999, hereafter SCB). 

For a particular statistic A, T(A) denotes the probabil- 
ity density of measuring a value A in a finite galaxy catalog. 
We consider the following counts-in-cells statistics: factorial 
moments Fk, cumulants £ and Sn, void probability Po and 
its corresponding scaling function a = — ln(Po)/Fi, as well 
the counts-in-cells distribution itself, Pjv. A large rCDM N- 
body experiment, £ , generated by the VIRGO consortium 
(e.g., Evrard et al. 1999) was divided into Ce = 4096 cu- 
bic subsamples, £i, i — 1, . . . , C'e for estimating numerically 
the cosmic distribution function, T(A). This was rendered 
possible by the fact that this "Hubble Volume" simulation 
involves 10 9 particles in a cubic box of size 2000ft- 1 Mpc. 
A detailed description of the simulation and the method we 
used to extract count-in-cells statistics in the full box £ and 
its each of subsamples £ j can be found in paper I. 

Paper I concentrated entirely on the first two moments 
of T(A), the average 



(1) 



(A) = / AT(A)dA, 
and the cosmic error 
(AA) 2 {(A - (A)) 2 ) = I (A- {A)) 2 T(A)dA 



(2) 



In the equations above, the mean (A) can differ from the 
true value. The cosmic bias is defined as 



b A = — 



(3) 



It is always present when indicators are constructed from 
unbiased estimators in a nonlinear fashion, such as cumu- 
lants (e.g., SBC; Hui & Gaztanaga 1998, hereafter HG). 

The most relevant results of paper I are summarized 
next: 

(i) The measured average (A) is in excellent agreement 
with perturbation theory, one-loop perturbation theory and 
extended perturbation theory (EPT) in their respective 
range of applicability. These tests demonstrate the quality 
of our numerical experiment. 

(ii) The measured cosmic error AA/A is in accord with 
the theoretical predictions of SC and SBC in their respec- 
tive domain of validity. A few percent accuracy is achieved 
in the weakly non-linear regime for the factorial moments. 
On small scales the theory tends to overestimate the errors, 



perhaps by a factor of two in the worst case, due to the 
approximate nature of the hierarchical models representing 
the joint moments (SCB). 

(iii) The cosmic bias is negligible compared to the errors 
in the full dynamic range, as predicted by theory (SCB, see 
also HG for an opposing view). 

(iv) Cross-correlations between statistics of order k and I 
are in general agreement with theory considering the pre- 
liminary nature of the measurements. The precision of the 
predictions, however, decreases with increasing difference of 
orders, \k — l\. This suggests that the local Poisson model 
(SC) looses accuracy, as expected. 

The theory of the errors confirmed by paper I provides 
an excellent basis for future maximum likelihood analyses 
of data whenever T is Gaussian. While this was tacitly as- 
sumed by most previous works, this article examines for the 
first time the range of validity of this assumption. To this 
end the cosmic distribution function T(A) is examined nu- 
merically. In particular, one of the parameters determining 
its shape, the cosmic skewness 



S={(A-{A)) :i )/(AA) 3 



(4) 



is calculated as well. When Gaussianity is no longer a good 
approximation, new Ansatze are proposed for characterizing 
T(j4). In addition we perform a preliminary analysis of the 
bivariate cosmic distributions T(A, B). 

The next section presents the estimates of T for the 
factorial moments, the cumulants (including the variance of 
the counts), the void probability distribution and its scal- 
ing function, and the counts-in-cells themselves. A universal 
shape is found for T(A) which is well described in all regimes 
by a generalized version of the lognormal distribution. In ad- 
dition to the mean (jl|) and variance (0), this depends on a 
third parameter, the cosmic skewness H) . This is also inves- 
tigated along with the resulting effective cosmic bias. Section 
3 presents the measured bivariate distributions, with explicit 
comparison to theoretical predictions of SCB. Finally, sec- 
tion 4 discusses the results in the context of maximum like- 
lihood analysis of future surveys. Readers unfamiliar with 
counts-in-cells statistics can consult Appendix A in paper I 
for a concise summary of definitions and notation. 



2 THE COSMIC DISTRIBUTION FUNCTION 

The main results of this section are displayed in figures 1-6. 
For simplicity figures 1, 3, and 5 will be referred to as type 
D, displaying distributions, while figures 2, 4, and 6 as type 
S, showing skewness. A general description of each type is 
followed by results obtained for the cosmic distribution of 
the factorial moments (§ 2.1), cumulants (§ 3.2), counts-in- 
cells (§ 2.3), and void probability with its scaling function a 
(§ 2.4). The cosmic skewness and the resulting effective bias 
are discussed in § 2.5. 

In all figures of type D, the results are displayed in a 
convenient system of coordinates. For any statistic A the 
normalized quantity 



5A 
AA 



A- A 
AA 



(•») 



is considered where A = (A) to simplify notations. The aver- 
age of xa is zero and its variance is unity by definition which 
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facilitates the comparison of the plots. The disadvantage of 
this coordinate system is that the cosmic error AA/A is not 
directly shown. 

For reference, each figure of type D displays a Gaus- 
sian (solid curve), and lognormal distribution with the same 
variance and average (dots, e.g. Coles & Jones 1991): 



T(A) = 



with 



: exp 



MA/ A) + k/2} 2 
2k 



K = ]n[l+ {AA/A) 2 ]. 

The skewness of this distribution is given by 
S = (AA/A) 3 + 3AA/A. 



(6) 

(7) 
(8) 



For comparison, the skewness of the lognormal assumption is 
plotted with dotted lines on figures of type S. The amount of 
skewness of the lognormal is a function of the cosmic error, 
i.e. more skewness on the figures indicates a larger cosmic 
error which is hidden by the choice of the coordinate system. 

In addition, a "generalized lognormal distribution" is 
introduced (dashes on figures of type D): 



T(A) 



AA[s(A - A) / AA + 1] ^/2^7 

{]n[s(A - A)/ AA + 1} + r?/2} 2 



x exp 



27, 



7/ = ln(l + s 2 



, (9) 
(10) 



where s is an adjustable parameter. It is fixed by the require- 
ment that the analytical function (^) have identical average, 
variance, and skewness, S = s 3 + 3s, with the measured 
T(A). It has more parameters, thus form (^) characterizes 
the shape of function T(A) better than the other two func- 
tions, especially for the large SA tail. As will be shown next, 
it is an excellent approximation for the underlying probabil- 
ity distribution in all regimes for all statistics. This robust 
universality is the most striking result of this article. 

The cosmic distribution function, as with any measure- 
ment from finite data, is subject to both measurement and 
cosmic errors (the "error on the error problem", cf. SC). 
The measurement error on T, due to the finite number of 
subsamples extracted from the whole simulation, can be 
calculated via straightforward error propagation. It essen- 
tially corresponds to the usual 1/y/Us factor, where Ce is 
the number of subsamples. This is plotted on all figures of 
type D as errorbars. On figures of type S no errorbars are 
shown, since this would require an accurate estimate up to 
the 6th moment of the cosmic distribution T(A). The ex- 
cellent agreement between cosmic error measurements and 
theory (paper I) indicates that the number of subsamples is 
sufficient and thus the resulting errorbars should be fairly 
small. Similar arguments suggest that the simulation volume 
was sufficient large to render the cosmic error on the cosmic 
distribution negligible. 



2.1 Factorial Moments 

t igure displays T(F k ) for 1 < k < 4 and various scales 
1=1, 7.8, 62.5/i" 1 Mpc. 

The agreement with the generalized lognormal distribu- 
tion is excellent, but even the lognormal gives an adequate 
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Figure 2. The skewness S = {{F k - F k ) 3 )/(AF k ) s as a function 
of scale. The triangles, squares, pentagons and hexagons respec- 
tively correspond to k = 1, 2, 3 and 4. There are also dotted lines 
corresponding to an underlying lognormal distribution (^) ; the or- 
ders increase from bottom to top. The errors on the measurement 
have not been estimated since it would require a complicated cal- 
culation depending on the estimate of up to the 6 th moment of 
T(F fc ). 



description. The deviation from a Gaussian is pronounced 
whenever the relative cosmic error AF k /F k is significantly 
larger than unity. While the figures do not show the cosmic 
error directly, the skewness of T(F k ) is a reliable indication. 
It increases with the order k since AF k /F k also increases 
with k. Figure ^ shows directly the quantity S measured for 
T(ffe) along with the lognormal value (^). The agreement 
shows that the lognormal model yields an excellent approx- 
imation. 

Fig. |l| in conjunction with the measurements of the cos- 
mic error in Paper I suggests that 



AA/A^A CTit , A crit =0.2, 



(11) 



is a practical criterion for the validity of the Gaussian ap- 
proximation. 

2.2 Cumulants 

Figure ^ is analogous to Fig. [jj showing functions T(£), 
T(S , 3) and T(S4) for the biased estimators. As was shown 
in paper I, the bias is negligible compared to the cosmic er- 
rors, thus correction is not necessary. The agreement with 
the lognormal is more approximate than for Y(F k ), except 
for the variance £. Indeed, the skewness of T(Sn) is in gen- 
eral different from the lognormal prediction, as illustrated by 
Fig. i On small scales it is larger than predicted by equa- 
tion (Mj while on large scales where edge effects dominate 
it is much smaller. The generalized lognormal (^|) can still 
account for the shape of T(Sjv) quite well, especially for the 
large Sn tail. 

The cosmic skewness of T(S k ) is fairly small on large 
scales. This is a natural consequence of the fact that cumu- 
lants are not subject to the positivity constraint S k > 0, 
as it is the case for factorial moments. On large scales, the 
measured Sk may well be positive or negative, similarly with 
£ on extremely large scales. As a result, the left-hand tail 
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Figure 1. The cosmic distribution function of measurements T(l'\ ) shown as a function of SF^/AF^ as explained in the text. The scale 
of the measurements I = 1, 7.8 and 62.5/i -1 Mpc is indicated on each panel. The order k = 1,2,3,4 increases from top to bottom. 
The solid, dotted, and dash curves correspond to the Gaussian, lognormal, and generalized lognormal [eq. (^)] distributions, respectively. 
While the coordinate system of the Figure does not display the value of the cosmic error directly, the amount of skewness of the lognormal 
distribution is an indicator of the magnitude AFj/Ft. The errorbars show the measurement error as discussed in beginning of § 2. 



of the distribution is more pronounced in both lower right 
panels of Fig. |§] than the corresponding figure for factorial 
moments, and T(Ss) is almost Gaussian in the middle right 
panel. 

Rule fQ for the Gaussian limit still applies, at least for 
£, and perhaps a slightly more stringent condition should be 
chosen for cumulants of higher order. T(S3) is fairly skewed 
even though the measured cosmic error is slightly below the 
threshold value for I = 1/t -1 Mpc and I — 7h~ x Mpc (see 
paper I). 



2.3 Counts-in-cells 

Figure |E| shows the function T(Pjv) in various cases. The 
upper panels focus on a small scale I ~ l/i" 1 Mpc. In this 
regime, the CPDF and — APn/Pn are decreasing functions 
of iV as demonstrated in paper I. Once again, the validity of 
the Gaussian approximation depends on the size of cosmic 
error. As a result, T(Pjv) is nearly Gaussian for TV = 1 
and becomes more and more skewed as TV increases. The 
lognormal approximation appears to be adequate within the 
errors, although it is slightly too skewed as illustrated by 
Fig. | 
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The middle panels show an intermediate scale £ ~ 
7.8/j -1 Mpc. On these scales (cf. paper I) both the CPDF 
and the cosmic error have a unimodal behaviour with an 
extremum (maximum for the CPDF and correspondingly 
minimum for the errors) for N ~ JV max = 26. This explains 
why for the chosen values of TV = 5, 50, and 500, func- 
tion T(Pjv) is skewed, approximately Gaussian, and skewed 
again respectively. For N = 5 lognormal is an excellent ap- 
proximation, while the skewness for N = 500 is somewhat 
less than that of a lognormal. 

Finally, the lower panels display the largest available 
scale e = 62.5ft" 1 Mpc. The behaviour of P N and AP N /P N 
is similar as previously with the extremum shifted to N ~ 
iVmax — 30000. In this case, the cosmic error is always 
large, at least of order fifty percent (cf. paper I). All the 
curves are thus significantly skewed for the chosen values of 
N = 25 000, 30 000 and 40 000. The agreement with the log- 
normal assumption is somewhat inaccurate, although the 
generalized lognormal improves the fit, especially for the 
left-hand panel. Note that the apparently abrupt limit for 
small values of SPn / APn is due to the positivity constraint 
Pjv > 0. This constraint becomes quite severe when the av- 
erage value is much smaller than the errors. While there is 
still plenty of dynamic range for upscattering, there is a hard 
restriction for down scattering. This is only partly taken into 



account in our generalized lognormal model, and any modi- 
fications in this respect are left for future work. Finally, the 
practical criterion (|Tl|) is again valid for determining Gaus- 
sian approximation. 

Note that the finite number C = 512 3 of sampling 
cells (see paper I), the CPDF is necessarily a multiple of 
1/(7. This quantization could cause contamination of T(Pjv) 
unless P N > 1/C ~ 10" 813 . The condition P N > 10~ 6 
adopted corresponds to at least ~ 100 cells per subsample 
in average with TV particles. Despite that, a small amount of 
contamination might still persist for SPn ii — Pjv, i.e. at the 
left side of the plots on figure |^. The same effect might also 
alter the tail of the counts-in-cells measurements presented 
in paper I, although not significantly. 



2.4 Void Probability and Scaling Function 

According to the investigations in paper I, the cosmic error 
on Po and a increases steadily with scale up to a sudden 
transition on scales I ~ 5/i _1 Mpc where it becomes large 
or infinite. This behavior was studied extensively by CBS 
where more of the details can be found. The most relevant 
consequence here is that in the available dynamic range the 
cosmic error is small, and T(Po) and Y(ef) are nearly Gaus- 
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Figure 4. Same is in Fig. El but we consider here the skewness of £ (left panel), 53 (middle panel) and S4 (right panel). 
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Figure 5. Same as in Fig. [lj, but now, the distribution function of measurements T(Pjv) is shown as a function of <5Pjv/APjv for various 
scales and values of N as indicated on each panel. 



sian. For this reason it would be superfluous to print the 
corresponding figures. 



2.5 Cosmic Skewness and Cosmic Bias 



According to Figs. |lH6l the degree of skewness of the cosmic 
distribution function increases with the order k and with 
\N — AT max |, where Af max is the value for which Pat reaches 
its maximum. The cosmic skewness is already significant for 
third order statistics, F3 and S3. An important consequence 



of the large cosmic skewness is that the maximum T(^4), i.e. 
the most likely measurement, is shifted to the left from the 
ensemble average on Figs. [I], ^ and ^. Maximizing the Ansatz 
(^), which is always a good fit to the cosmic distribution 
function, yields 



b A = A max /A - 1 = — — 



(12) 



As ^(1 + 52)3/2 
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Figure 6. The skewness S of Y(Pjv) as a function of N for three 
scales, I = lh~ 1 Mpc (left curve), I = 7.8h~ 1 Mpc (middle curve) 
and t = 62. 5h -1 Mpc (right curve). The dotted curves give the 
lognormal prediction, which is always larger than the measure- 
ment. 



where \>a is the effective cosmic bias. Since s > 0, it is 
negative, and its absolute value is smaller than the cosmic 
error, 



A A 

M £0.66— . 

For a lognormal distribution, s — AA/A, 



[1 + (AA/A) 2 ] 



-3/2 



< 1. 



(13) 



(14) 



-(3/2) (AA/A) 2 from expanding eq. (jwj) 



The effective cosmic bias becomes increasingly significant 
when the cosmic error is large. Similarly to the cosmic bias 
(SBC), \> A 

the small error regime. 

The phenomenon of effective bias was already pointed 
out by SC (and preliminarily investigated by Colombi, 
Bouchet & Schaeffer, 1994). Since A ma , x is the most likely 
value of A, the only one available measurement in a catalog 
of the neighbouring Universe is likely to yield lower than av- 
erage value. This is true even for an unbiased indicator such 
as Fk or Pn- Unfortunately, this effect cannot be corrected 
for, but it can be taken into account in the framework of 
the maximum likelihood approach using the above results 
on the shape of T(A). 



3 BIVARIATE COSMIC DISTRIBUTION 

FUNCTION: A PRELIMINARY ANALYSIS 

Figures ^ and ^| display contours of the joint cosmic distri- 
bution T(A,B) (solid lines) for factorial moments and cu- 
mulants, respectively. For comparison the Gaussian limit is 
shown, 



T(A,B) = 



Q(A,B) 



2-kAAAB^I - p 
1 



. exp 



-^Q(A,B) 



\S?A 



2pX A X B + x%\ 



(15) 



(16) 



where p = (SxaSxb) is the cross-correlation coefficient. Dot- 
dashes display the above function with the measured p, 



AA/A and AB/B, while long dashes represent the same 
function but with the parameters inferred from the theory 
of SCB with the E 2 PT model (see paper I for details). The 
contours, correspond in the Gaussian limit to the ler (thin 
curves) level, Q(A, B) — 1, and the 2a (thick curves) level, 
Q(A, B) = 4, are displayed in the coordinate system of the 
measured xa and xb- 

On i — 7.1ft. - Mpc scales the theoretical predictions 
are expected to match the second order moments of T for 
factorial moments, and even the cross-correlations (see Pa- 
per I). This is illustrated by Fig. [j], where the long-dashed 
ellipses superpose well to the dot-dashed ones. For the cumu- 
lants the theory overestimates the errors slightly, which is re- 
flected in the contours of Fig. ^, although cross-correlations 
are still reasonable, as indicated by the orientation of the 
ellipses. 

The departure from the Gaussian limit is significant, 
except for the upper left panel on Figs. |^ and ^| and in- 
creases with order, in accord with the findings of the pre- 
vious section. The contrast with Gaussianity increases with 
the cosmic error, and thus with the order considered. With 
the exception of N, F2, £ and S3, the measured cosmic error 
violates ( p"l| ) at I = 7.1/i _1 M pc (see paper I). Moreover, 
as shown previously, criterion (Jllh should be strengthened 
for cumulants Sk, k > 3. In conclusion, condition (hlj) dis- 
tinguishes the Gaussian limit for T(A, B) adequately when 
applied to both statistics A and B. 

Similarly to the monovariate distribution (§ 2), function 
T(A,B) develops skewness and a significant tail for large 
values of x — (xa,xb) when rule (O) is broken. There are 
three notable consequences: 

(i) The effective cosmic bias (§ 2.5) is present again, i.e. 
the maximum of T is shifted from the average towards the 
lower left corner of the panels. 

(ii) The contours tend to cover a smaller area than for 
the Gaussian limit. 

(iii) As a result of the positivity constraint, there is a well 
denned lower vertical/horizontal bound in some panels, e.g., 
for XFi, F4 > 0. 



4 SUMMARY AND DISCUSSION 

This paper has presented an experimental study of the cos- 
mic distribution function of measurements T(A), where A is 
an indicator of a statistic related to counts-in-cells. The cos- 
mic distribution was considered for the factorial moments 
Fk, cumulants £ and Sn, the void probability Pq with its 
scaling function, a = — ln(Po)/^i, and finally the counts-in- 
cells Pjv themselves. To analyse properties of the function 
T(A), we used a state of the art rCDM simulation divided 
into 4096 sub-cubes large enough themselves to represent a 
full galaxy catalog. The statistics mentioned above were ex- 
tracted from each subsample, and the resulting distribution 
of measurements was used to estimate T(A). 

While paper I concentrated on the first two moments 
of the cosmic distribution, the average and the errors, here 
the focus was shifted towards the general shape of function 
T itself, including its skewness, the cosmic skewness. The 
main results of this analysis are the followings: 

(i) In contrast with popular belief, the cosmic distribu- 
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Figure 7. The joint cosmic distribution function for factorial moments, T(_Ffc,F;). Thin and thick solid contours are displayed for two 
values of T which would correspond respectively to la and 2cr contours in the Gaussian limit. The latter is shown as thin and thick 
dot-dashes. For comparison, the analytic prediction of SCB for E 2 PT is also plotted with thin and thick long dashes corresponding to 
the Gaussian limit with theoretical cosmic errors and cross-correlation coefficient. The scale of the measurement is £ = 7.8 h _1 Mpc as 
displayed on each panel. The image used to draw contour plots has 30 2 pixels. It was generated using bilinear interpolation from an 
other array with logarithmic binning in each coordinate in order to reduce the errors on the estimate of function T{A,B) in each bin. 
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Figure 8. Same as in Fig. [?], but for the average count F\ and the cumulants, £, 53 and 54. 



tion is not Gaussian in general. The most reassuring re- 
sult is, however, that the Gaussian approximation appears 
to be valid whenever the cosmic errors are small, typically 
AA/A ^ 0.2. This result is quite robust and it is insensitive 
to the particular statistic considered (except that a slightly 



more stringent condition might be chosen for cumulants Sk, 
k > 3). This means that for any quantity which can be 
reliably measured from a survey, a Gaussian error analysis 
should be valid. 

When the relative cosmic error AA/A becomes significant, 



© 1998 RAS, MNRAS 000, 



10 I. Szapudi et al. 



T becomes increasingly skewed. Since AFk/Fk and ASk/Sk 
increase with k (SC, paper I), and APn/Pn with \N— iV max |, 
where 7V max is the maximum of the CPDF, so does the cos- 
mic skewness, which eventually results in the break down of 
the Gaussian approximation. Functions T(ffc) and T(£) are 
well approximated by a lognormal law. Otherwise, a third 
order parametrisation matching the average, the variance 
and the skewness of the observed distribution is necessary, 
and in general sufficient. Such a generalization of lognor- 
mal distribution is proposed and found to be in agreement 
with the measurements in all regimes investigated. Note that 
there are other alternatives such as the Edgeworth expan- 
sion (e.g., Juszkiewicz et al. 1995) or the skewed lognormal 
approximation of Colombi (1994). This latter consists of ap- 
plying Edgeworth expansion to log(A). This method, when 
applicable, improves significantly the domain of validity of 
the Edgeworth expansion, normally only useful in the weakly 
non- Gaussian limit AA/A Jj 0.5. 

(ii) While paper I examined the cosmic bias resulting 
from the non-linear construction of certain estimators, here 
a new phenomenon was pointed out, which is similar in ef- 
fect, but different in nature: the effective cosmic bias. It 
affects all estimators, including unbiased ones, and is a re- 
sult of the cosmic skewness. Whenever the cosmic errors are 
large, the cosmic distribution function develops a skewness 
corresponding to a long tail. As a result, the most likely 
measurement will be smaller than the average. Such a phe- 
nomenon was pointed out earlier in SC, and here it has been 
found to be universal. As SCB and paper I found that the 
cosmic bias is usually insignificant compared to the cosmic 
errors, it is likely that the effective cosmic bias is respon- 
sible for some of the conspicuously low measurements from 
small galaxy catalogs. This is in contrast with the conjec- 
ture of Hui & Gaztanaga (1998, hereafter HG), who assumed 
that the cosmic bias resulting from the use of biased esti- 
mators could explain this phenomenon. The effective cosmic 
bias renders correction for the cosmic bias useless, in con- 
trast with the proposition of HG. The effective cosmic bias 
(and the less significant cosmic bias if any) can be taken 
into account in the framework of a full maximum likelihood 
analysis, which relies on the shape of the cosmic distribution 
function approximated with sufficient accuracy. 

(iii) A preliminary investigation of joint distribution 
T(A,B) was performed for factorial moments and cumu- 
lants. It confirms the validity of the above points (i) and 
(ii) for cosmic bivariate distribution. In particular, a prac- 
tical criterion for the validity of the Gaussian limit is that 
the cosmic error for both estimators be small enough, typ- 
ically AA/A <> 0.2 and AB/B <> 0.2. This result can be 
safely generalized to iV-variate distribution functions, thus 
providing the basis of full multivariate maximum likelihood 
analysis of data in the Gaussian limit. 

We have not attempted to develop a more accurate multi- 
variate approximation than (multivariate) Gaussian as this 
would go beyond the scope of this paper. However, we 
conject ure that an extension of our generalized lognormal 



errors are small; but this is precisely the criterion for the 
Gaussian limit as we shown previously. A generalization of 
the lognormal distribution expanding the logarithm of the 
statistics via the multivariate Edgeworth technique provides 
a potential improvement of this method. 

It is worth noting that the behaviour of the cosmic dis- 
tribution function is expected to be extremely robust with 
respect to the particular model studied in this paper, rCDM. 
For example, SC, in their preliminary investigations, found 
essentially the same universal behaviour in Rayleigh-Levy 
fractals. Moreover, as discussed more extensively in Paper 
I, the results are sufficiently stable that the usual worries of 
galaxy biasing (not to be confused with cosmic and effective 
cosmic bias) and redshift distortions are unlikely to change 
them qualitatively. Indeed the shape of the cosmic distribu- 
tion function is almost entirely determined by the magnitude 
of the cosmic error, and it is insensitive to which statistic is 
considered. The powerful universality found among entirely 
different statistics is likely to carry over when the two ef- 
fects mentioned above, which are subtleties in comparison 
with the range of statistics investigated, are taken into ac- 
count. 

The results found in the present work and in paper I are 
encouraging for investigations in future large galaxy catalogs 
and for problems related to data compression (e.g. Bond 
1995; Vogeley & Szalay 1996; Tegmark, Taylor & Heav- 
ens 1996; Bond, Jaffe & Knox 1998; Seljak 1998). For ex- 
ample, the cosmic error on factorial moments is expected 
to be small on a large dynamic range in the SDSS (see, 
e.g. CSS), implying according to the above findings that 
the cosmic distribution function should be nearly Gaussian 
in this regime. In that case, theory of the cosmic errors 
and cross-correlations, outlined in SC, CSS and SCB and 
thoroughly tested in paper I, will be sufficient for full mul- 
tivariate maximum likelihood analyses. Preliminary inves- 
tigations on current surveys are being undertaken by Sza- 
pudi, Colombi & Bernardeau (1999b) and Bouchet, Colombi 
& Szapudi (1999). Similarly the theoretical background is 
currently being developed for future weak lensing surveys 
(Berneardeau, Colombi, Szapudi, 1999), where statistical 
analyses will be conducted with indicators very close to 
counts-in-cells (see, e.g. Bernardeau, Van Waerbeke & Mel- 
lier 1997; Mellier 1998; Jain, Seljak & White 1999). 
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